20 research outputs found

    Hybrid model and structured sparsity for under-determined convolutive audio source separation

    No full text
    International audienceWe consider the problem of extracting the source signals from an under-determined convolutive mixture, assuming known filters. We start from its formulation as a minimization of a convex functional, combining a classical â„“2\ell_2 discrepancy term between the observed mixture and the one reconstructed from the estimated sources, and a sparse regularization term of source coefficients in a time-frequency domain. We then introduce a first kind of structure, using a hybrid model. Finally, we embed the previously introduced Windowed-Group-Lasso operator into the iterative thresholding/shrinkage algorithm, in order to take into account some structures inside each layers of time-frequency representations. Intensive numerical studies confirm the benefits of such an approach

    Multimodal Transformer Using Cross-Channel attention for Object Detection in Remote Sensing Images

    Full text link
    Object detection in Remote Sensing Images (RSI) is a critical task for numerous applications in Earth Observation (EO). Unlike general object detection, object detection in RSI has specific challenges: 1) the scarcity of labeled data in RSI compared to general object detection datasets, and 2) the small objects presented in a high-resolution image with a vast background. To address these challenges, we propose a multimodal transformer exploring multi-source remote sensing data for object detection. Instead of directly combining the multimodal input through a channel-wise concatenation, which ignores the heterogeneity of different modalities, we propose a cross-channel attention module. This module learns the relationship between different channels, enabling the construction of a coherent multimodal input by aligning the different modalities at the early stage. We also introduce a new architecture based on the Swin transformer that incorporates convolution layers in non-shifting blocks while maintaining fixed dimensions, allowing for the generation of fine-to-coarse representations with a favorable accuracy-computation trade-off. The extensive experiments prove the effectiveness of the proposed multimodal fusion module and architecture, demonstrating their applicability to multimodal aerial imagery.Comment: submitted to ICASSP202

    Context Normalization for Robust Image Classification

    Full text link
    Normalization is a pre-processing step that converts the data into a more usable representation. As part of the deep neural networks (DNNs), the batch normalization (BN) technique uses normalization to address the problem of internal covariate shift. It can be packaged as general modules, which have been extensively integrated into various DNNs, to stabilize and accelerate training, presumably leading to improved generalization. However, the effect of BN is dependent on the mini-batch size and it does not take into account any groups or clusters that may exist in the dataset when estimating population statistics. This study proposes a new normalization technique, called context normalization, for image data. This approach adjusts the scaling of features based on the characteristics of each sample, which improves the model's convergence speed and performance by adapting the data values to the context of the target task. The effectiveness of context normalization is demonstrated on various datasets, and its performance is compared to other standard normalization techniques

    Séparation aveugle de source : de l'instantané au convolutif

    No full text
    Blind source separation (BSS) consists of estimating the source signals only from the observed mixtures. The problem can be divided into two categories according to the mixing model: instantaneous mixtures, where delay and reverberation (multi-path effect) are not taken into account, and convolutive mixtures which are more general but more complicated. Moreover, the additive noise at the sensor level and the underdetermined setting, where there are fewer sensors than the sources, make the problem even more difficult.In this thesis, we first studied the link between two existing methods for instantaneous mixtures: independent component analysis (ICA) and sparse component analysis (SCA). We then proposed a new formulation that works in both determined and underdetermined cases, with and without noise. Numerical evaluations show the advantage of the proposed approaches.Secondly, the proposed formulation is generalized for convolutive mixtures with speech signals. By integrating a new approximation model, the proposed algorithms work better than existing methods, especially in noisy and/or high reverberation scenarios.Then, we take into account the technique of morphological decomposition and the use of structured sparsity which leads to algorithms that can better exploit the structures of audio signals. Such approaches are tested for underdetermined convolutive mixtures in a non-blind scenario.At last, being benefited from the NMF model, we combined the low-rank and sparsity assumption and proposed new approaches for under-determined convolutive mixtures. The experiments illustrate the good performance of the proposed algorithms for music signals, especially in strong reverberation scenarios.La séparation aveugle de source consiste à estimer les signaux de sources uniquement à partir des mélanges observés. Le problème peut être séparé en deux catégories en fonction du modèle de mélange: mélanges instantanés, où le retard et la réverbération (effet multi-chemin) ne sont pas pris en compte, et des mélanges convolutives qui sont plus généraux mais plus compliqués. De plus, le bruit additif au niveaux des capteurs et le réglage sous-déterminé, où il y a moins de capteurs que les sources, rendent le problème encore plus difficile.Dans cette thèse, tout d'abord, nous avons étudié le lien entre deux méthodes existantes pour les mélanges instantanés: analyse des composants indépendants (ICA) et analyse des composant parcimonieux (SCA). Nous avons ensuite proposé une nouveau formulation qui fonctionne dans les cas déterminés et sous-déterminés, avec et sans bruit. Les évaluations numériques montrent l'avantage des approches proposées.Deuxièmement, la formulation proposés est généralisés pour les mélanges convolutifs avec des signaux de parole. En intégrant un nouveau modèle d'approximation, les algorithmes proposés fonctionnent mieux que les méthodes existantes, en particulier dans des scénarios bruyant et / ou de forte réverbération.Ensuite, on prend en compte la technique de décomposition morphologique et l'utilisation de parcimonie structurée qui conduit à des algorithmes qui peuvent mieux exploiter les structures des signaux audio. De telles approches sont testées pour des mélanges convolutifs sous-déterminés dans un scénario non-aveugle.Enfin, en bénéficiant du modèle NMF (factorisation en matrice non-négative), nous avons combiné l'hypothèse de faible-rang et de parcimonie et proposé de nouvelles approches pour les mélanges convolutifs sous-déterminés. Les expériences illustrent la bonne performance des algorithmes proposés pour les signaux de musique, en particulier dans des scénarios de forte réverbération

    Blind source separation : from instantaneous to convolutive

    No full text
    La séparation aveugle de source consiste à estimer les signaux de sources uniquement à partir des mélanges observés. Le problème peut être séparé en deux catégories en fonction du modèle de mélange: mélanges instantanés, où le retard et la réverbération (effet multi-chemin) ne sont pas pris en compte, et des mélanges convolutives qui sont plus généraux mais plus compliqués. De plus, le bruit additif au niveaux des capteurs et le réglage sous-déterminé, où il y a moins de capteurs que les sources, rendent le problème encore plus difficile.Dans cette thèse, tout d'abord, nous avons étudié le lien entre deux méthodes existantes pour les mélanges instantanés: analyse des composants indépendants (ICA) et analyse des composant parcimonieux (SCA). Nous avons ensuite proposé une nouveau formulation qui fonctionne dans les cas déterminés et sous-déterminés, avec et sans bruit. Les évaluations numériques montrent l'avantage des approches proposées.Deuxièmement, la formulation proposés est généralisés pour les mélanges convolutifs avec des signaux de parole. En intégrant un nouveau modèle d'approximation, les algorithmes proposés fonctionnent mieux que les méthodes existantes, en particulier dans des scénarios bruyant et / ou de forte réverbération.Ensuite, on prend en compte la technique de décomposition morphologique et l'utilisation de parcimonie structurée qui conduit à des algorithmes qui peuvent mieux exploiter les structures des signaux audio. De telles approches sont testées pour des mélanges convolutifs sous-déterminés dans un scénario non-aveugle.Enfin, en bénéficiant du modèle NMF (factorisation en matrice non-négative), nous avons combiné l'hypothèse de faible-rang et de parcimonie et proposé de nouvelles approches pour les mélanges convolutifs sous-déterminés. Les expériences illustrent la bonne performance des algorithmes proposés pour les signaux de musique, en particulier dans des scénarios de forte réverbération.Blind source separation (BSS) consists of estimating the source signals only from the observed mixtures. The problem can be divided into two categories according to the mixing model: instantaneous mixtures, where delay and reverberation (multi-path effect) are not taken into account, and convolutive mixtures which are more general but more complicated. Moreover, the additive noise at the sensor level and the underdetermined setting, where there are fewer sensors than the sources, make the problem even more difficult.In this thesis, we first studied the link between two existing methods for instantaneous mixtures: independent component analysis (ICA) and sparse component analysis (SCA). We then proposed a new formulation that works in both determined and underdetermined cases, with and without noise. Numerical evaluations show the advantage of the proposed approaches.Secondly, the proposed formulation is generalized for convolutive mixtures with speech signals. By integrating a new approximation model, the proposed algorithms work better than existing methods, especially in noisy and/or high reverberation scenarios.Then, we take into account the technique of morphological decomposition and the use of structured sparsity which leads to algorithms that can better exploit the structures of audio signals. Such approaches are tested for underdetermined convolutive mixtures in a non-blind scenario.At last, being benefited from the NMF model, we combined the low-rank and sparsity assumption and proposed new approaches for under-determined convolutive mixtures. The experiments illustrate the good performance of the proposed algorithms for music signals, especially in strong reverberation scenarios

    Revisiting Sparse ICA from a Synthesis Point of View: Blind Source Separation for Over and Underdetermined Mixture

    No full text
    International audienceThis paper studies the existing links between two approaches of Independent Component Analysis (ICA), projection pursuit and Infomax/maximum likelihood estimation, and the Sparse Component Analysis (SCA), mainly used in the Generalized Morphological Component Analysis (GMCA), to tackle the Blind Source Separation (BSS) of instantaneous mixtures problem. If ICA methods suit well for overdetermined and noiseless mixtures, SCA (via GMCA) has demonstrated its robustness to noise. Using the "synthesis" point of view to reformulate ICA methods as an optimization problem, we propose a new optimization framework, which encompasses both approaches. We show that the algorithms developed to minimize the proposed functional built on SCA, but imposing a numerical decorrelation constraint on the sources, aims to improve the Signal to Inference Ratio (SIR) of the estimated sources, without degrading the Signal to Distortion Ratio (SDR)

    Reverberant Audio Blind Source Separation via Local Convolutive Independent Vector Analysis

    No full text
    International audienceIn this paper, we propose a new formulation for the blind source separation problem for audio signals with convolutive mixtures to improve the separation performance of Independent Vector Analysis (IVA). The proposed method benefits from both the recently investigated convolutive approximation model and the IVA approaches that take advantages of the cross-band information to avoid permutation alignment. We first exploit the link between the IVA and the Sparse Component Analysis (SCA) methods through the structured sparsity. We then propose a new framework by combining the convolutive narrowband approximation and the Windowed-Group-Lasso (WGL). The optimisation of the model is based on the alternating optimisation approach where the convolutive kernel and the source components are jointly optimised

    Sparsity and low-rank amplitude based blind Source Separation

    No full text
    International audienceThis paper presents a new method for blind source separation problem in reverberant environments with more sources than microphones. Based on the sparsity property in the time-frequency domain and the low-rank assumption of the spectrogram of the source, the STRAUSS (SparsiTy and low-Rank AmplitUde based Source Separation) method is developed. Numerical evaluations show that the proposed method outperforms the existing multichannel NMF approaches, while it is exclusively based on amplitude information

    Vers une approche unifiée pour la séparation aveugle de sources en sur et sous-déterminé, basée sur la parcimonie et la décorrélation

    Get PDF
    National audienceVers une approche unifiée pour la séparation aveugle de sources en sur et sous-déterminé, basée sur la parcimonie et la décorrélation Résumé – L'analyse en composantes indépendantes (ICA – Independant Component Analysis) est un des principaux outils pour la séparation aveugle de sources (BSS – Blind Source Seperation). Les études théoriques et expérimentales montrent que l'hypothèse d'indépendance semble bien adaptée pour la séparation des signaux audios. Ces dernières années, les approches par optimisation utilisant la parcimonie sont apparues comme une autre approche efficace pour la séparation de sources. Cet article commence par introduire une nouvelle approche de séparation aveugle de sources qui tire parti à la fois de la décorrélation (qui est une conséquence directe de l'indépendance) et de la parcimonie dans un dictionnaire de Gabor. On montre que l'approche proposée fonctionne à la fois pour les mélanges sur-déterminés et sous-déterminés. Les résultats expérimentaux illustrent les bonnes performances sur des mélanges de signaux audio. Abstract – Independent component analysis (ICA) has been a major tool for blind source separation (BSS). Both theoretical and practical evaluations showed that the hypothesis of independence suits well for audio signals. In the last few years, optimization approach based on sparsity has emerged as another efficient implement for BSS. This paper starts from introducing some new BSS methods that take advantages of both decorrelation (which is a direct consequence of independence) and sparsity using overcomplete Gabor representation. It is shown that the proposed methods work in both under-determined and over-determined cases. Experimental results illustrate the good performances of these approaches for audio mixtures
    corecore